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Abstract— THIS PAPER IS ELIGIBLE FOR THE STUDENT 
PAPER AWARD. We derive an upper bound on the capacity of 
non-binary deletion channels. Although binary deletion channels 
have received significant attention over the years, and many 
upper and lower bounds on their capacity have been derived, 
such studies for the non-binary case are largely missing. The state 
of the art is the following: as a trivial upper bound, capacity of 
an erasure channel with the same input alphabet as the deletion 
channel can be used, and as a lower bound the results of Q) 
are available. In this paper, we derive the first non-trivial non- 
binary deletion channel capacity upper bounds and reduce the 
gap between the existing achievable rates and upper bounds. 
To derive the new upper bounds we first prove an inequality 
between the capacity of a 2A'-ary deletion channel with deletion 
probability d, denoted by CW (d), and the capacity of the binary 
deletion channel with the same deletion probability, C2(d), that 
■ is, C 2K (d) < C 2 {d) + (1 - d)log(K). Then by employing the 
existing upper bounds on the capacity of the binary deletion 
channel, we obtain upper bounds on the capacity of the IK -ary 
deletion channel. We illustrate via examples the use of the new 
bounds and discuss their asymptotic behavior as d — > 0. 

I. Introduction 

Non-binary independent and identically distributed (i.i.d.) 
deletion channels can be used to model information trans- 
mission over a finite buffer channel JT], where a packet 
(non-binary symbol) loss occurs whenever a packet arrives 
at a full buffer. Dobrushin [2] proved the existence of the 
Shannon's theorem for discrete memoryless channels with 
synchronization errors. As a result the Shannon's theorem 
holds in non-binary deletion channels and information and 
transmission capacities are equal. 

In this paper, we focus on a 2AT-ary deletion channel 
C in which every transmitted symbol is either lost through 
the transmission with probability of d or received correctly 
with probability of 1 — d. There is no information about 
the position of the lost symbols at either the transmitter or 
receiver. We present a non-trivial upper bound on the capacity 
of this channel. Clearly the capacity of a 2A'-ary erasure 
channel with erasure probability d is an upper bound on the 
capacity of the 2AT-ary deletion channel since by revealing 
information about the position of the lost symbols to the 
receiver, the corresponding genie-aided deletion channel is 
nothing but an erasure channel. Therefore, for the capacity 
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of the 2fT-ary input deletion channel C2x{d), the relation 
C*2if(rf) < (1 — d)\og(2K) holds. Besides this trivial upper 
bound, to the best of our knowledge, there are no other 
(tighter) upper bounds on the capacity of non-binary deletion 
channels. 

Our main result is to relate the capacity of a 2K -ary deletion 
channel with deletion probability d to the capacity of the 
binary deletion channel with deletion probability d by the 
inequality C2K < ^(rf) + (1 — d)log(K). As a result any 
upper bound on the binary deletion channel capacity can be 
used to derive an upper bound on the 2i4T-ary deletion channel 
capacity. For example, by using the result from 0], we obtain 
C 2 K{d) < Qog(K) + 0.4143) (1 - d) for d > 0.65. 

The paper is organized as follows. In Section [II] we briefly 
review the existing work on the capacity of binary and non- 
binary deletion channels. In Section [III] we first give the 
general 2AT-ary deletion channel model and then we observe 
that it can be considered as a parallel concatenation of K inde- 
pendent deletion channels (where each input is binary). Also 
in the same section, we discuss the possible generalization 
of the existing Blahut-Arimoto algorithm (BAA) based upper 
bounding approaches (useful for the binary deletion channels) 
to the case of 2Af-ary deletion channels. In Section IIVI we 
prove the main result of the paper providing an upper bound on 
C2K(d) in terms of C2{d). In Section [V] several implications 
of the result are given: we compare the resulting capacity upper 
bounds with the existing capacity upper and lower bounds, and 
we provide a discussion of the channel capacity behavior as 
the deletion probability approaches zero. Finally, we conclude 
the paper in Section [VT] 

II. Previous Works 

Capacity of binary deletion channels has received signif- 
icant attention in the existing literature, e.g., see and 
references therein. There are several results on capacity lower 
bounds 0-Q. Gallager J5] provided the first lower bound 
on the transmission capacity of the channels with random 
insertion, deletion and substitution errors which provides a 
lower bound on the binary deletion channel capacity as well. 
The tightest lower bound on the binary deletion channel 
capacity is provided in [7j where the information capacity 
of the binary deletion channel is directly lower bounded by 
considering input sequences as alternating blocks of zeros and 
ones (runs) and the length of the runs L as i.i.d. random vari- 



ables following a particular distribution over positive integers 
with a finite expectation and finite entropy. 

There are also several capacity upper bounds on the binary 
deletion channel, e.g., 0, QD, ©. In GO a genie-aided 
channel is considered in which the receiver is provided by 
side information about the completely deleted runs, e.g., in 
transmitting '110001' over the original channel by deleting the 
entire run of zeros, the sequence '111' is received while in the 
considered genie-aided channel '11 — 1' represents the possible 
reception. Then an upper bound on the capacity per unit cost 
of the genie-aided channel is computed by running the BAA 
algorithm. Fertonani and Duman [9|, by considering several 
different genie-aided channels, are able to derive tighter upper 
bounds on the binary deletion channel capacity compared to 
the results in [8| for d > 0.05. In 0, authors improve upon 
the upper bounds provided in J9) for d > 0.65 where they 
first derive an inequality relation among the capacity of three 
different binary deletion channels and as a special case they 
obtain C2{\d + 1 — A) < \C%{d) which enables the authors 
to show that C 2 (d) > 0.4143(1 - d) for d > 0.65. 

To the best of our knowledge, the only non-trivial lower 
bounds on the capacity of the non-binary deletion channels are 
provided in JT] where two different bounds are derived. More 
precisely, the achievable rates of the 2K-axy input deletion 
channel are computed for i.i.d. and Markovian codebooks by 
considering a simple decoder which decides in favor of a 
sequence if the received sequence is the subsequence of only 
one transmitted sequence. The derived achievable rates are 
given by 

C-iK > log (aj^Ti) + (! - d ) 1 °S( 2K - !) - CD 

by considering i.i.d. codebooks, where Hb(d) = — dlog(d) — 
(1 - d) log(l - d), and 

C 2K > sup [-(l-d)log((l-g)A + gB)- 7 log(e)] 

7>0, 0<p<l 

(2) 

by considering Markovian codebooks, with A = 

^K-m-^-^y B = e-H(l- P )A + p) and 

(l-d){2K-l)(2Kp-l) 



q = y \T-T-dl2Kp-i) L> )■ Non-binary input 

alphabet channels with synchronization errors are also 
considered in iflOl where the capacity of memoryless 
synchronization error channels in the presence of noise have 
been studied. The main focus of the work in |[T0l is on 
asymptotic behavior of the channel capacity for large values 
of K. 

III. Preliminaries 

A. Channel Model 

An i.i.d. 2K-aiy deletion channel with input alphabet X = 
{1, . . . , 2K} is considered in which every transmitted symbol 
is either randomly deleted with probability of d or received 
correctly with probability of 1 — d while there is no information 
about the lost symbols and the position of the lost symbols at 
the transmitter and the receiver. In transmission of N symbols 
through the channel, the input sequence is denoted by X = 



X £ {1,2,-,2K}% 
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Fig. 1. 2-fC-ary deletion channel as a parallel concatenation of K independent 
binary input deletion channels. 

(a?i, . . . , xn) in which x n 6 X and X 6 X N , and the output 
sequence is denoted by Y = (yi, . . . , dm) in which M is a 
binomial random variable with parameters N and d (due to 
the characteristics of the i.i.d. deletion channel). 

1) A Different Look at the 2K-ary Deletion Channel: Any 
2A'-ary input deletion channel with deletion probability d can 
be considered as a parallel concatenation of K independent 
binary deletion channels Ck (k £ {1,...,K}) all with the 
same deletion probability d, as shown in Fig. [T] in which 
the input symbols 2k — 1 and 2k travel through Ck and the 
surviving output symbols of the subchannels are combined 
based on the order in which they go through the subchannels. 
Xk and Y k denote the input and output sequences of the k- 
th channel, respectively, and Nk and Mk denote the length of 
Xk and Y k, respectively. 

To be able to relate the mutual information between the 
input and output sequences of the 2if-ary deletion chan- 
nel, I(X;Y), with the mutual information between the in- 
put and output sequences of the considered binary deletion 
channels, I(Xk]Y k), we define two new random vectors 
F x and F y . More precisely, F x = (f x [1] , . . . , f x [N] ) and 
F v = (f y [l},...,f y [M]) such that f x [n] 6 {1,...,K} 
and f y [m] E {l,...,K} denote the label of the subchan- 
nel the n-fh input symbol and m-th output symbol belong 
to, respectively. Clearly, by knowing X, one can deter- 
mine (Xi, . . . , X K ,F X ) and by knowing (Xi, . . . , X K , F x ) 
can determine X. The same situation holds for Y and 
(Yi, . . . , Y k, F y ). Therefore, we have 

/(X; Y) = I(Xi, . . . , Xk, F x ; Yi,..., Y k, F y ) 

K 

= J2h+lF, (3) 

k=l 



where Ik 



,X K ,F x ;Y k \Y 1 ,...,Y k _ 1 ) and 



,X 



A'. 



• F 



Y, 



,Y K ). 



(4) 



I(X 1: . 
If = I(X l 

In Section lPVl we will derive upper bounds on Ik and Ip which 
will enable us to relate the non-binary and binary deletion 
channels capacities, and resulting in the main result of the 
paper. 

B. Discussion on the BAA Based Upper Bounds 

One approach to derive upper bounds on the 2if -ary dele- 
tion channel capacity is to modify the numerical approaches 
in O in which the decoder (and possibly the encoder) 
of the binary deletion channel is provided with some side 



information about the deletion process and the capacity (or an 
upper bound on the capacity) of the resulted genie-aided chan- 
nel is numerically evaluated by running the BAA algorithm. 
Although this approach is useful for the binary channels (even 
when other impairments such as insertions and substitutions 
are considered lITTh . for the non-binary case, running the BAA 
for large values of K is not computationally feasible. E.g., 
one of the upper bounds in [9| is obtained by computing the 
capacity of the binary deletion channel with finite length of 
transmission L = 17. Obviously, by increasing the alphabet 
size, 2K, the maximum possible value of L in running the 
BAA algorithm decreases. Therefore, to achieve reasonable 
upper bounds, L needs to be increased which makes the 
numerical computations infeasible. 

The main contribution of the present paper is that we are 
able to relate the capacity of the 2Jf-ary deletion channel to the 
binary deletion channel capacity through an inequality relation 
which enables us to upper bound the 2K -ary deletion channel 
capacity avoiding computationally formidable BAA directly 
for the 2if -ary deletion channel. 

IV. A Novel Upper Bound on C 2 K(d) 

As introduced in Section IIII-AI a 2if -ary deletion channel 
can be considered as a parallel concatenation of K inde- 
pendent binary deletion channels. This new look at a 2K- 
ary deletion channel enables us to relate the 2A'-ary deletion 
channel capacity to the binary deletion channel capacity with 
the same deletion error probability which is given in the 
following theorem. 

Theorem 1. Let C 2 K{d) denote the capacity of a 2K-ary 
deletion channel with deletion probability d, then 



C 2K (d) <C 2 (d) + (l-d)log(K). 



(5) 



As given in (0, the mutual information I(X;Y) can be 
expanded in terms of several other mutual information terms, 
I k for k £ {1, . . . , K} and Ip. To prove Theorem Q] we first 
derive upper bounds on I k and Ip in the following two lemmas 
which enable us to complete the proof of Theorem Q] 

Lemma 1. For all the possible input distributions 
P(X\, . . . , Xk , Fx), the mutual information Ik given 
in (0 can be upper bounded by 

h < E{N k }C 2 (d) + 2 \og(N + 1), 

where E{.} denotes the expected value. 

Proof: For I k , since P{Y k \Y 1 , . . , , Y k _ x , X k ) = 
P(Y k \X k ) and P(Y k \X 1 ,...,X K ,F x ,Y 1 ,...,Y k _ 1 ) = 
P(Y k\X k), we can write 

h = I(X k ;Y k \Y u ...,Y k ^) 

X k -i, X k +i, ■ Xk, F x ; Y k \Y x , Yk-i, X k 
= I(X k :Y k \Y 1 ,...,Y k _ 1 ) 

= H(Y k \Yt, IVi) - H(y k \Y u . . . , Y k _t,X k ) 
= H(Y k ) - I{Y X , F fc _ i; Y k ) - H(Y k \X k ) 
<I(X k ;Y k ). (6) 



Furthermore, I(X k ;Y k ) can be written as 

I(X k ; Y k ) =I(X k] Y k , N k ) - I(X k ; N k \Y k ) 

=I(X k ; Y k \N k ) + I(X k - N k ) - I(X k ; N k \Y k ). 

Since H(N k \X k ) = and I(X k ; N k \Y k ) > 0, we arrive at 

I{X k - Y k ) <I{X k - Y k \N k ) + H(N k ) 

<I(X k ;Y k \N k )+\og{N + l) 

N 

=Y / P{N k =n k )I{X k ;Y k \n k )+\og{N + 1), (7) 

n k =0 

where the second inequality results since there are N + 1 
possibilities for N k and as a result H(N k ) < log(A^ + l). Fur- 
thermore, as it is shown in J9), for a finite length transmission 
over the deletion channel, the mutual information rate between 
the transmitted and received sequences can be upper bounded 
in terms of the capacity of the channel after adding some 
appropriate term, which can be spelled out as ||9] Eqn. (39)] 



I{X k ; Y k \N k = n k ) < n k C 2 (d) + H(D k \N k 



n k 



(8) 



where D k denotes the number of deletions through the trans- 
mission of N k bits over the fc-th channel and 

H(D k \N k =n k ) = -^P(n k ,n,d)\og{P{n k ,n,d)) 



< log (n k + 1) < log (N + 1), 



(9) 



with P(n k ,n,d) = ( n *)d n (l - d) n "- n . Substituting © and 
© into Q, we obtain 



N 



I(X k ; Y k ) < ^ P{N k = n k ) {n k C 2 {d)) + 2\og(N + 1) 

n fc =0 

=E{N k }C 2 (d) +21og(iV + l). 

Finally, by substituting the above inequality in ©, the proof 
follows. ■ 

Lemma 2. For all the possible input distributions, the mutual 
information Ip given in © can be upper bounded by 

If < N(l - d)log(K). 

Proof: Using the definition of the mutual information, we 
can write 

Ip=H(F y \Yi, . . . , Yk)— H(F y \Yi 7 . . . , Yk, X\, . . . , Xk, F^ 
<H{F V \Y 1 ,...,Y K ) 

<H(F y \M 1 ,...,M K ), (10) 

where the last inequality follows since 
(Mx,...,M K ) is a function of (Y"x, . . . , Y K ), i.e., 
H(Mi, M K \Yi, ...,Y K ) = 0. For fixed m k such that 
m, there are ( m m ) possibilities for F y 

It follows 



Lfc=i m k 

leading to H(F y \mi, . . . , tuk) < log 
from the inequality (see Appendix lAl 



,mi,...,mjfy 



l0£ 



7711, 



K 



< mlog(m) - ^m fc log(m fe ), (11) 



k=i 



that H(F y \mi, . . . ,m K ) < mlog(m) - J2k=i m k log(m fc ). 
Furthermore, since g([mi, . . . ,m k ]) = 

^mi log ^mJ - ^TO fc log(m fe ) is a concave 
\fe=i / \fc=i / fc=i 

function of [mi, . . . , ttik] (see Appendix |B), employing the 
Jensen's inequality yields 



/F<(^{M fe }jl0g, 

\k=l I \k=l 



K 



E{M k } -> E{M k }log(E{M k }). 



k=l 



On the other hand, due to the fact that Ck are i.i.d. binary input 
deletion channels, we have E{Mk] = N(l — d)a k where a k 
depend on the input distribution P(X) and Y^ k =i a k — !• 
Hence, we obtain 

I F <N(1 - d) Nog (JV(1 - d)) - «fe lo S W - d H)J 



K 



N(l -d)J2 a k log oi k = N{1 - d)H{a u ...,a K ) 

(12) 



fe=i 



< iV(l-d)log(A), 

which concludes the proof. ■ 

A. Proof of Theorem Q] 

Substituting the results of Lemmas Q] and [2] in (0, we obtain 

I(X;Y) < E Nu ... tNK | J2N k ^C 2 (d)+2Klog(N + l) 

+ N(l-d) log(A) 
= NC 2 (d) + 2K\og(N + 1) + JV(1 - d) \og(K), 

where we have used the fact that J\=i = N independent 
of the input distribution P(X). Since the above inequality 
holds for any input distribution P(X) and any value of N, 
we can write 

C 2K {d) = lim max— I(X:Y) 

< C 2 (d) + (l-d)\og(K), 

which concludes the proof of Theorem Q] □ 
V. Some Implications 

The trivial upper bound on the capacity of the 2/T-ary 
deletion channel is given by (1 — d) log(2A") which is the 
capacity of the 2_ftT-ary erasure channel. In fact, if we reveal 
the side information about the position of the dropped symbols 
to the receiver of a 2/\-ary deletion channel, the resulting 
genie-aided channel is nothing but a 2A'-ary erasure channel. 

We have shown in the previous section that by substituting 
any upper bound on the capacity of the binary deletion channel 
into (0, an upper bound on the 2A-ary deletion channel 
capacity results. Obviously, by employing C2{d) < 1 — d 
which is the trivial upper bound on the binary deletion channel 
capacity, the erasure channel upper bound on the 2A-ary 
deletion channel capacity is obtained. Therefore, any upper 
bound tighter than 1 — d on the binary deletion channel capacity 



gives an upper bound tighter than log(2A)(l — d) on the 2K- 
ary deletion channel capacity. The amount of improvement is 
1 — d — C2 B (d), where C^ 3 denotes the upper bound on the 
binary deletion channel capacity. 

As it is shown in [[10) that (1-d) \og{2K) - 1 < C 2 K{d) < 
(1 — d) log(2A"), the existing trivial upper and lower bounds 
are tight enough for asymptotically large values of A, and 
i.i.d. distributed input sequences are sufficient to achieve the 
capacity. However, the importance of the result in Theorem[T]is 
for moderate values of K, where the amount of improvement 
in closing the gap between the existing upper and lower bounds 
is significant. 

To demonstrate the improvement over the trivial erasure 
channel upper bound, we compare the upper bound C 2 k (d) < 
C 2 B (d)+(l— d) log( A) with the erasure channel upper bound 
log(2A')(l — d) and the tightest existing lower bound (f2]i 
(provided in |T|) in Fig. [2] for 4-ary and 8-ary deletion 
channels. Here we utilize the binary deletion channel capacity 
upper bounds C^ s ((i) in 0, (9), where for d < 0.65 we use 
the results in ||9] Table III] and for d > 0.65 we use the upper 
bound C 2 (d) < 0.4143(1 - d) given in 0. 

Another implication of the result in Theorem [1] is in study- 
ing the asymptotic behavior of the 2A-ary deletion channel 
capacity for d — > 0. It is shown in lfl2ll that 



C 2 {d) = l + dlog(d)- A^ + A 2 d 2 



0(d 3 



(13) 



for small d and any e > with Ax 1.15416377, A 2 m 
1.78628364 and 0(.) denoting the standard Landau (big O) 
notation. Employing this result into (0, leads to an upper 
bound expansion for small values of d as 

C 2K {d) <1 + d\og{d) - {Ax + log(A)) d + A 2 d 2 + log(AT) 
+ 0(d 3 - e ). (14) 

In Fig. [3] we compare the above upper bound (by ignoring 
the 0(d 3 ~ e ) term) with the lower bound © for d < 0.1. We 
observe that by employing the capacity expansion ( fT3l in (0, 
a better characterization for the asymptotic behavior of the 
2A'-ary deletion channel capacity is obtained as d — > 0. 

VI. Conclusions 

We have derived the first non-trivial upper bound on the 2 A- 
ary deletion channel capacity. We first considered the 2A-ary 
deletion channel as a parallel concatenation of K independent 
binary deletion channels, all with the same deletion probabil- 
ity. We then related the capacity of the original channel to that 
of the binary deletion channel. By doing so we obtained an 
upper bound on the capacity of the 2A'-ary deletion channel in 
terms of the capacity of the binary deletion channel and as a 
result any upper bound on the capacity of the binary deletion 
channel. The provided upper bound results into tighter upper 
bounds than the trivial erasure channel upper bound for the 
entire range of the deletion probability d and all K > 0. 
Appendix A 
Proof of Inequality (ITTb 

It follows from the inequality 

log (™) < mlog(m) - mi log(mi) -(m-mi) log(m - mi) 
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and the trivial erasure channel upper bound for the 4-ary and 8-ary deletion 
channels. 
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Fig. 3. Comparison between the upper bound U4t (ignoring the 0(d 3e ) 
term) and the lower bound {2). 



given in JT3] p. 353] that 

logf m I = Y] log 
\mi,...,m K J ^ 




m,j log 



K 

;log(m) - ^2 m k \og(m k ) 




rrij log rrij 



k=l 



Appendix B 
Concavity of g([mi, . . . , mk]) 
For the Hessian of g([mi, . . . , mk]), we have 



V 2 g([mi,...,m fe ]) 



1 T , f 1 

11 - dm 9\ — 



1 

m K 



where 1 is an all one vector of length K, i.e., 1 = [1, . . . , 1] T , 
and diaq \ -7— > denotes a diagonal matrix whose k- 

th diagonal element is Furthermore, by defining a = 
[oi, . . . , qk], we can write 



aV 2 ga T = (Ef = l«fe) 



2 K „2 

k 



Eft. 
fe=l m k 



Z. — ^ nm 



mi. 
k=l K 

K A'-l K 



^—(E^+2E E 



Eft \ Z / K Z / Z / 

fe=1 TOfe \ fc=1 fc=1 



a k cij 



K 



K 



E« 2 fe -E 

fe=l k=l 
K-l K 



m fc 



-^k h^A rn k m 3 >) 



j=k+ 

K-l K 



V^ft" z.^ 

Lfe=i ra fc fe=i 



l sr^ sr- m j i m k n2 

— E E ^: (afe ~^r a ^ 



i=fc+l 

which is negative for all mk, rrij > 0. Therefore, 
V 2 g([mi, . . . , mfe]) is a negative semi-definite matrix and as a 
result g([mi, . . . , mk}) is a concave function of [mi, . . . , m^]. 
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